Overview

Dataset statistics

Number of variables25
Number of observations200
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory62.2 KiB
Average record size in memory318.4 B

Variable types

NUM22
CAT3

Reproduction

Analysis started2020-07-06 16:28:32.084093
Analysis finished2020-07-06 16:30:05.167739
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Name has a high cardinality: 135 distinct values High cardinality
AB is highly correlated with G and 3 other fieldsHigh Correlation
G is highly correlated with AB and 2 other fieldsHigh Correlation
PA is highly correlated with G and 3 other fieldsHigh Correlation
H is highly correlated with G and 4 other fieldsHigh Correlation
1B is highly correlated with HHigh Correlation
R is highly correlated with AB and 2 other fieldsHigh Correlation
2B has 3 (1.5%) zeros Zeros
3B has 47 (23.5%) zeros Zeros
HR has 8 (4.0%) zeros Zeros
BB has 3 (1.5%) zeros Zeros
IBB has 45 (22.5%) zeros Zeros
HBP has 26 (13.0%) zeros Zeros
SF has 21 (10.5%) zeros Zeros
SH has 137 (68.5%) zeros Zeros
GDP has 15 (7.5%) zeros Zeros
SB has 27 (13.5%) zeros Zeros
CS has 40 (20.0%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE
Distinct count200
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean389.415
Minimum0
Maximum2060
Zeros1
Zeros (%)0.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile94.95
Q1156.75
median250
Q3415.25
95-th percentile1674.35
Maximum2060
Range2060
Interquartile range (IQR)258.5

Descriptive statistics

Standard deviation416.776687
Coefficient of variation (CV)1.070263567
Kurtosis6.921553054
Mean389.415
Median Absolute Deviation (MAD)261.31345
Skewness2.6537269
Sum77883
Variance173702.8068
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 94.5 224.5 442.5 783. 2060. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
392 1 0.5%
 
154 1 0.5%
 
171 1 0.5%
 
170 1 0.5%
 
168 1 0.5%
 
167 1 0.5%
 
163 1 0.5%
 
161 1 0.5%
 
159 1 0.5%
 
157 1 0.5%
 
Other values (190) 190 95.0%
 
ValueCountFrequency (%) 
0 1 0.5%
 
1 1 0.5%
 
31 1 0.5%
 
43 1 0.5%
 
63 1 0.5%
 
ValueCountFrequency (%) 
2060 1 0.5%
 
1989 1 0.5%
 
1959 1 0.5%
 
1945 1 0.5%
 
1928 1 0.5%
 

Season
Categorical

Distinct count3
Unique (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2019
70
2017
67
2018
63
ValueCountFrequency (%) 
2019 70 35.0%
 
2017 67 33.5%
 
2018 63 31.5%
 

Length

Max length6
Mean length6
Min length6
ValueCountFrequency (%) 
Decimal_Number 6 85.7%
 
Other_Punctuation 1 14.3%
 
ValueCountFrequency (%) 
Common 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

Name
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count135
Unique (%)67.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
Alex Bregman
 
3
Aaron Judge
 
3
J.T. Realmuto
 
3
Eugenio Suarez
 
3
Mookie Betts
 
3
Other values (130)
185
ValueCountFrequency (%) 
Alex Bregman 3 1.5%
 
Aaron Judge 3 1.5%
 
J.T. Realmuto 3 1.5%
 
Eugenio Suarez 3 1.5%
 
Mookie Betts 3 1.5%
 
Jose Altuve 3 1.5%
 
Freddie Freeman 3 1.5%
 
Christian Yelich 3 1.5%
 
Mike Trout 3 1.5%
 
Justin Turner 3 1.5%
 
Other values (125) 170 85.0%
 

Length

Max length18
Mean length12.86
Min length9
ValueCountFrequency (%) 
Lowercase_Letter 25 48.1%
 
Uppercase_Letter 24 46.2%
 
Dash_Punctuation 1 1.9%
 
Other_Punctuation 1 1.9%
 
Space_Separator 1 1.9%
 
ValueCountFrequency (%) 
Latin 49 94.2%
 
Common 3 5.8%
 
ValueCountFrequency (%) 
ASCII 52 100.0%
 

Team
Categorical

Distinct count30
Unique (%)15.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
Dodgers
 
16
Astros
 
14
Yankees
 
13
Athletics
 
11
Nationals
 
11
Other values (25)
135
ValueCountFrequency (%) 
Dodgers 16 8.0%
 
Astros 14 7.0%
 
Yankees 13 6.5%
 
Athletics 11 5.5%
 
Nationals 11 5.5%
 
Red Sox 10 5.0%
 
Braves 10 5.0%
 
Brewers 8 4.0%
 
Mets 8 4.0%
 
Twins 7 3.5%
 
Other values (20) 92 46.0%
 

Length

Max length12
Mean length6.855
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 21 53.8%
 
Uppercase_Letter 16 41.0%
 
Dash_Punctuation 1 2.6%
 
Space_Separator 1 2.6%
 
ValueCountFrequency (%) 
Latin 37 94.9%
 
Common 2 5.1%
 
ValueCountFrequency (%) 
ASCII 39 100.0%
 

Age
Real number (ℝ≥0)

Distinct count20
Unique (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.3
Minimum19.0
Maximum38.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum19
5-th percentile22
Q125
median27
Q329
95-th percentile33
Maximum38
Range19
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.372963745
Coefficient of variation (CV)0.1235517855
Kurtosis0.3756130043
Mean27.3
Median Absolute Deviation (MAD)2.613
Skewness0.3710304691
Sum5460
Variance11.37688442
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[19. 23.5 30.5 33.5 38. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
27 29 14.5%
 
28 25 12.5%
 
25 23 11.5%
 
26 21 10.5%
 
29 18 9.0%
 
30 16 8.0%
 
24 16 8.0%
 
23 9 4.5%
 
31 8 4.0%
 
33 7 3.5%
 
Other values (10) 28 14.0%
 
ValueCountFrequency (%) 
19 1 0.5%
 
20 3 1.5%
 
21 3 1.5%
 
22 6 3.0%
 
23 9 4.5%
 
ValueCountFrequency (%) 
38 1 0.5%
 
37 1 0.5%
 
36 1 0.5%
 
35 3 1.5%
 
34 3 1.5%
 

G
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count71
Unique (%)35.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean131.47
Minimum2.0
Maximum162.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum2
5-th percentile30.95
Q1127.75
median144
Q3156
95-th percentile161.05
Maximum162
Range160
Interquartile range (IQR)28.25

Descriptive statistics

Standard deviation38.0350156
Coefficient of variation (CV)0.2893056637
Kurtosis3.247242445
Mean131.47
Median Absolute Deviation (MAD)26.2538
Skewness-2.022998863
Sum26294
Variance1446.662412
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 2. 110. 133.5 154.5 159.5 161.5 162. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
158 12 6.0%
 
156 11 5.5%
 
162 10 5.0%
 
159 9 4.5%
 
155 8 4.0%
 
140 7 3.5%
 
157 7 3.5%
 
145 7 3.5%
 
134 6 3.0%
 
144 6 3.0%
 
Other values (61) 117 58.5%
 
ValueCountFrequency (%) 
2 1 0.5%
 
5 1 0.5%
 
8 1 0.5%
 
17 1 0.5%
 
18 1 0.5%
 
ValueCountFrequency (%) 
162 10 5.0%
 
161 1 0.5%
 
160 3 1.5%
 
159 9 4.5%
 
158 12 6.0%
 

AB
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count145
Unique (%)72.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean483.565
Minimum1.0
Maximum664.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum1
5-th percentile52.9
Q1455
median538
Q3584.25
95-th percentile639.05
Maximum664
Range663
Interquartile range (IQR)129.25

Descriptive statistics

Standard deviation163.8260684
Coefficient of variation (CV)0.3387881016
Kurtosis2.393620265
Mean483.565
Median Absolute Deviation (MAD)114.8929
Skewness-1.788036636
Sum96713
Variance26838.98068
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 74.5 369. 470.5 602.5 664. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
597 4 2.0%
 
534 4 2.0%
 
559 3 1.5%
 
480 3 1.5%
 
632 3 1.5%
 
575 3 1.5%
 
549 3 1.5%
 
602 3 1.5%
 
455 2 1.0%
 
556 2 1.0%
 
Other values (135) 170 85.0%
 
ValueCountFrequency (%) 
1 2 1.0%
 
6 1 0.5%
 
10 1 0.5%
 
31 1 0.5%
 
40 1 0.5%
 
ValueCountFrequency (%) 
664 1 0.5%
 
661 1 0.5%
 
657 1 0.5%
 
653 1 0.5%
 
651 1 0.5%
 

PA
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count149
Unique (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean551.96
Minimum1.0
Maximum747.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum1
5-th percentile64.95
Q1520.75
median612.5
Q3666.25
95-th percentile707
Maximum747
Range746
Interquartile range (IQR)145.5

Descriptive statistics

Standard deviation184.7489357
Coefficient of variation (CV)0.3347143556
Kurtosis2.574828223
Mean551.96
Median Absolute Deviation (MAD)128.5132
Skewness-1.852868002
Sum110392
Variance34132.16925
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 82. 360. 481.5 630. 708. 747. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
645 5 2.5%
 
632 4 2.0%
 
606 3 1.5%
 
695 3 1.5%
 
707 3 1.5%
 
679 3 1.5%
 
705 3 1.5%
 
662 3 1.5%
 
481 3 1.5%
 
1 2 1.0%
 
Other values (139) 168 84.0%
 
ValueCountFrequency (%) 
1 2 1.0%
 
8 1 0.5%
 
10 1 0.5%
 
38 1 0.5%
 
47 1 0.5%
 
ValueCountFrequency (%) 
747 1 0.5%
 
745 1 0.5%
 
740 1 0.5%
 
725 1 0.5%
 
723 1 0.5%
 

H
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count103
Unique (%)51.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.745
Minimum1.0
Maximum213.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum1
5-th percentile9
Q1128.75
median152.5
Q3170
95-th percentile190.05
Maximum213
Range212
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation49.80090991
Coefficient of variation (CV)0.3589384116
Kurtosis1.91590466
Mean138.745
Median Absolute Deviation (MAD)35.3066
Skewness-1.589651781
Sum27749
Variance2480.130628
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 13.5 97.5 130.5 191.5 213. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
170 5 2.5%
 
175 5 2.5%
 
176 5 2.5%
 
161 5 2.5%
 
155 4 2.0%
 
136 4 2.0%
 
149 4 2.0%
 
187 4 2.0%
 
153 4 2.0%
 
156 4 2.0%
 
Other values (93) 156 78.0%
 
ValueCountFrequency (%) 
1 2 1.0%
 
3 1 0.5%
 
5 1 0.5%
 
6 1 0.5%
 
7 2 1.0%
 
ValueCountFrequency (%) 
213 1 0.5%
 
204 1 0.5%
 
201 2 1.0%
 
197 1 0.5%
 
192 2 1.0%
 

1B
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count84
Unique (%)42.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.025
Minimum0.0
Maximum170.0
Zeros2
Zeros (%)1.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile6
Q170
median88
Q3101.25
95-th percentile125.1
Maximum170
Range170
Interquartile range (IQR)31.25

Descriptive statistics

Standard deviation32.51662698
Coefficient of variation (CV)0.3964233707
Kurtosis0.893124625
Mean82.025
Median Absolute Deviation (MAD)24.021
Skewness-0.9113552855
Sum16405
Variance1057.33103
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 8.5 56.5 114.5 136.5 170. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
91 6 3.0%
 
95 5 2.5%
 
78 5 2.5%
 
97 5 2.5%
 
99 5 2.5%
 
83 5 2.5%
 
93 5 2.5%
 
104 5 2.5%
 
6 4 2.0%
 
70 4 2.0%
 
Other values (74) 151 75.5%
 
ValueCountFrequency (%) 
0 2 1.0%
 
2 1 0.5%
 
3 1 0.5%
 
4 4 2.0%
 
6 4 2.0%
 
ValueCountFrequency (%) 
170 1 0.5%
 
137 1 0.5%
 
136 2 1.0%
 
135 1 0.5%
 
134 1 0.5%
 

2B
Real number (ℝ≥0)

ZEROS
Distinct count46
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.86
Minimum0.0
Maximum56.0
Zeros3
Zeros (%)1.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile2
Q124
median30
Q336.25
95-th percentile44
Maximum56
Range56
Interquartile range (IQR)12.25

Descriptive statistics

Standard deviation11.91765718
Coefficient of variation (CV)0.4129472344
Kurtosis0.4997351294
Mean28.86
Median Absolute Deviation (MAD)8.961
Skewness-0.7473180384
Sum5772
Variance142.0305528
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 2.5 15.5 24.5 44.5 56. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
34 13 6.5%
 
26 12 6.0%
 
29 10 5.0%
 
27 9 4.5%
 
38 9 4.5%
 
33 9 4.5%
 
25 8 4.0%
 
30 7 3.5%
 
35 7 3.5%
 
40 7 3.5%
 
Other values (36) 109 54.5%
 
ValueCountFrequency (%) 
0 3 1.5%
 
1 6 3.0%
 
2 5 2.5%
 
3 3 1.5%
 
5 2 1.0%
 
ValueCountFrequency (%) 
56 1 0.5%
 
54 1 0.5%
 
52 1 0.5%
 
51 1 0.5%
 
49 1 0.5%
 

3B
Real number (ℝ≥0)

ZEROS
Distinct count12
Unique (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.66
Minimum0.0
Maximum14.0
Zeros47
Zeros (%)23.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile7
Maximum14
Range14
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.491069476
Coefficient of variation (CV)0.9364922843
Kurtosis1.804831109
Mean2.66
Median Absolute Deviation (MAD)1.9724
Skewness1.179952196
Sum532
Variance6.205427136
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 3.5 7.5 14. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 47 23.5%
 
2 39 19.5%
 
1 28 14.0%
 
3 25 12.5%
 
5 18 9.0%
 
4 17 8.5%
 
6 10 5.0%
 
7 8 4.0%
 
9 3 1.5%
 
10 2 1.0%
 
Other values (2) 3 1.5%
 
ValueCountFrequency (%) 
0 47 23.5%
 
1 28 14.0%
 
2 39 19.5%
 
3 25 12.5%
 
4 17 8.5%
 
ValueCountFrequency (%) 
14 1 0.5%
 
10 2 1.0%
 
9 3 1.5%
 
8 2 1.0%
 
7 8 4.0%
 

HR
Real number (ℝ≥0)

ZEROS
Distinct count48
Unique (%)24.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.2
Minimum0.0
Maximum59.0
Zeros8
Zeros (%)4.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile1
Q118
median26
Q334
95-th percentile41.1
Maximum59
Range59
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.28268381
Coefficient of variation (CV)0.4874080877
Kurtosis-0.1622567655
Mean25.2
Median Absolute Deviation (MAD)9.672
Skewness-0.3538696609
Sum5040
Variance150.8643216
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 20.5 40. 59. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
24 12 6.0%
 
23 10 5.0%
 
33 10 5.0%
 
27 9 4.5%
 
34 8 4.0%
 
36 8 4.0%
 
0 8 4.0%
 
25 7 3.5%
 
32 7 3.5%
 
35 7 3.5%
 
Other values (38) 114 57.0%
 
ValueCountFrequency (%) 
0 8 4.0%
 
1 5 2.5%
 
2 5 2.5%
 
4 3 1.5%
 
6 1 0.5%
 
ValueCountFrequency (%) 
59 1 0.5%
 
53 1 0.5%
 
52 1 0.5%
 
49 1 0.5%
 
48 1 0.5%
 

R
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count78
Unique (%)39.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.055
Minimum1.0
Maximum137.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum1
5-th percentile4.95
Q173.75
median88
Q3101
95-th percentile121.05
Maximum137
Range136
Interquartile range (IQR)27.25

Descriptive statistics

Standard deviation30.65797666
Coefficient of variation (CV)0.3736271605
Kurtosis1.426417147
Mean82.055
Median Absolute Deviation (MAD)21.8018
Skewness-1.284904992
Sum16411
Variance939.9115327
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 6.5 59.5 74.5 112.5 137. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
85 6 3.0%
 
100 6 3.0%
 
89 6 3.0%
 
102 5 2.5%
 
96 5 2.5%
 
84 5 2.5%
 
88 5 2.5%
 
95 5 2.5%
 
77 5 2.5%
 
75 5 2.5%
 
Other values (68) 147 73.5%
 
ValueCountFrequency (%) 
1 3 1.5%
 
2 2 1.0%
 
3 2 1.0%
 
4 3 1.5%
 
5 1 0.5%
 
ValueCountFrequency (%) 
137 1 0.5%
 
135 1 0.5%
 
129 3 1.5%
 
128 1 0.5%
 
127 1 0.5%
 

RBI
Real number (ℝ≥0)

Distinct count85
Unique (%)42.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76.815
Minimum0.0
Maximum132.0
Zeros2
Zeros (%)1.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile4
Q167
median81
Q397.25
95-th percentile118
Maximum132
Range132
Interquartile range (IQR)30.25

Descriptive statistics

Standard deviation30.73931199
Coefficient of variation (CV)0.4001732993
Kurtosis0.6763158341
Mean76.815
Median Absolute Deviation (MAD)22.84645
Skewness-0.970720634
Sum15363
Variance944.9053015
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 9.5 48. 66. 106. 132. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
67 6 3.0%
 
90 6 3.0%
 
92 6 3.0%
 
79 5 2.5%
 
104 5 2.5%
 
72 5 2.5%
 
100 4 2.0%
 
87 4 2.0%
 
63 4 2.0%
 
93 4 2.0%
 
Other values (75) 151 75.5%
 
ValueCountFrequency (%) 
0 2 1.0%
 
1 2 1.0%
 
2 2 1.0%
 
3 2 1.0%
 
4 3 1.5%
 
ValueCountFrequency (%) 
132 1 0.5%
 
130 2 1.0%
 
126 1 0.5%
 
124 1 0.5%
 
121 1 0.5%
 

BB
Real number (ℝ≥0)

ZEROS
Distinct count84
Unique (%)42.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.685
Minimum0.0
Maximum134.0
Zeros3
Zeros (%)1.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile2.95
Q140.75
median59.5
Q373
95-th percentile106.1
Maximum134
Range134
Interquartile range (IQR)32.25

Descriptive statistics

Standard deviation27.928584
Coefficient of variation (CV)0.4841567825
Kurtosis0.1383917539
Mean57.685
Median Absolute Deviation (MAD)21.63075
Skewness-0.04593102627
Sum11537
Variance780.005804
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 6.5 29.5 80.5 109.5 134. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
70 7 3.5%
 
73 6 3.0%
 
52 5 2.5%
 
55 5 2.5%
 
48 5 2.5%
 
47 5 2.5%
 
35 5 2.5%
 
2 5 2.5%
 
76 5 2.5%
 
64 5 2.5%
 
Other values (74) 147 73.5%
 
ValueCountFrequency (%) 
0 3 1.5%
 
1 2 1.0%
 
2 5 2.5%
 
3 1 0.5%
 
4 3 1.5%
 
ValueCountFrequency (%) 
134 1 0.5%
 
130 1 0.5%
 
127 1 0.5%
 
122 1 0.5%
 
119 1 0.5%
 

IBB
Real number (ℝ≥0)

ZEROS
Distinct count22
Unique (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.31
Minimum0.0
Maximum25.0
Zeros45
Zeros (%)22.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile14.05
Maximum25
Range25
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.773660985
Coefficient of variation (CV)1.107577955
Kurtosis2.452472884
Mean4.31
Median Absolute Deviation (MAD)3.6499
Skewness1.560946933
Sum862
Variance22.7878392
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 3.5 6.5 16.5 25. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 45 22.5%
 
2 26 13.0%
 
1 24 12.0%
 
3 21 10.5%
 
4 13 6.5%
 
6 13 6.5%
 
5 13 6.5%
 
11 8 4.0%
 
8 6 3.0%
 
9 6 3.0%
 
Other values (12) 25 12.5%
 
ValueCountFrequency (%) 
0 45 22.5%
 
1 24 12.0%
 
2 26 13.0%
 
3 21 10.5%
 
4 13 6.5%
 
ValueCountFrequency (%) 
25 1 0.5%
 
21 1 0.5%
 
20 1 0.5%
 
18 1 0.5%
 
17 1 0.5%
 

SO
Real number (ℝ≥0)

Distinct count114
Unique (%)57.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.335
Minimum0.0
Maximum211.0
Zeros2
Zeros (%)1.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile18.9
Q184.75
median110.5
Q3137.25
95-th percentile174.1
Maximum211
Range211
Interquartile range (IQR)52.5

Descriptive statistics

Standard deviation43.09581209
Coefficient of variation (CV)0.4015075427
Kurtosis0.2223796948
Mean107.335
Median Absolute Deviation (MAD)33.0782
Skewness-0.4909702164
Sum21467
Variance1857.24902
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 65.5 151.5 211. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
131 5 2.5%
 
101 4 2.0%
 
128 4 2.0%
 
82 4 2.0%
 
111 4 2.0%
 
99 4 2.0%
 
108 4 2.0%
 
104 4 2.0%
 
147 3 1.5%
 
146 3 1.5%
 
Other values (104) 161 80.5%
 
ValueCountFrequency (%) 
0 2 1.0%
 
2 1 0.5%
 
3 1 0.5%
 
8 2 1.0%
 
13 1 0.5%
 
ValueCountFrequency (%) 
211 1 0.5%
 
208 1 0.5%
 
189 1 0.5%
 
188 1 0.5%
 
183 1 0.5%
 

HBP
Real number (ℝ≥0)

ZEROS
Distinct count24
Unique (%)12.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.005
Minimum0.0
Maximum27.0
Zeros26
Zeros (%)13.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q13
median5
Q38
95-th percentile15
Maximum27
Range27
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.874812198
Coefficient of variation (CV)0.8117922062
Kurtosis2.89583734
Mean6.005
Median Absolute Deviation (MAD)3.646
Skewness1.397925682
Sum1201
Variance23.76379397
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 8.5 12.5 27. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 26 13.0%
 
3 25 12.5%
 
4 22 11.0%
 
8 21 10.5%
 
5 16 8.0%
 
7 14 7.0%
 
6 14 7.0%
 
2 12 6.0%
 
9 10 5.0%
 
10 9 4.5%
 
Other values (14) 31 15.5%
 
ValueCountFrequency (%) 
0 26 13.0%
 
1 5 2.5%
 
2 12 6.0%
 
3 25 12.5%
 
4 22 11.0%
 
ValueCountFrequency (%) 
27 1 0.5%
 
24 1 0.5%
 
22 1 0.5%
 
21 2 1.0%
 
20 1 0.5%
 

SF
Real number (ℝ≥0)

ZEROS
Distinct count12
Unique (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.69
Minimum0.0
Maximum12.0
Zeros21
Zeros (%)10.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q35
95-th percentile8
Maximum12
Range12
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.466641767
Coefficient of variation (CV)0.6684666036
Kurtosis0.04590085378
Mean3.69
Median Absolute Deviation (MAD)1.9845
Skewness0.5709021895
Sum738
Variance6.084321608
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 6.5 8.5 12. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3 36 18.0%
 
2 33 16.5%
 
4 30 15.0%
 
6 23 11.5%
 
0 21 10.5%
 
5 16 8.0%
 
1 15 7.5%
 
7 11 5.5%
 
8 7 3.5%
 
9 4 2.0%
 
Other values (2) 4 2.0%
 
ValueCountFrequency (%) 
0 21 10.5%
 
1 15 7.5%
 
2 33 16.5%
 
3 36 18.0%
 
4 30 15.0%
 
ValueCountFrequency (%) 
12 1 0.5%
 
10 3 1.5%
 
9 4 2.0%
 
8 7 3.5%
 
7 11 5.5%
 

SH
Real number (ℝ≥0)

ZEROS
Distinct count11
Unique (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.86
Minimum0.0
Maximum14.0
Zeros137
Zeros (%)68.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile5
Maximum14
Range14
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.034970645
Coefficient of variation (CV)2.366244937
Kurtosis17.9826266
Mean0.86
Median Absolute Deviation (MAD)1.1782
Skewness3.892680432
Sum172
Variance4.141105528
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 3.5 14. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 137 68.5%
 
1 30 15.0%
 
2 13 6.5%
 
3 7 3.5%
 
5 5 2.5%
 
6 2 1.0%
 
12 2 1.0%
 
7 1 0.5%
 
14 1 0.5%
 
4 1 0.5%
 
ValueCountFrequency (%) 
0 137 68.5%
 
1 30 15.0%
 
2 13 6.5%
 
3 7 3.5%
 
4 1 0.5%
 
ValueCountFrequency (%) 
14 1 0.5%
 
12 2 1.0%
 
9 1 0.5%
 
7 1 0.5%
 
6 2 1.0%
 

GDP
Real number (ℝ≥0)

ZEROS
Distinct count25
Unique (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.035
Minimum0.0
Maximum26.0
Zeros15
Zeros (%)7.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q16
median10
Q314
95-th percentile20
Maximum26
Range26
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.853639936
Coefficient of variation (CV)0.5833223653
Kurtosis-0.5729307053
Mean10.035
Median Absolute Deviation (MAD)4.72675
Skewness0.07589503881
Sum2007
Variance34.2651005
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 6.5 15.5 21.5 26. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11 16 8.0%
 
8 15 7.5%
 
0 15 7.5%
 
9 13 6.5%
 
10 13 6.5%
 
12 13 6.5%
 
5 12 6.0%
 
7 11 5.5%
 
13 10 5.0%
 
14 10 5.0%
 
Other values (15) 72 36.0%
 
ValueCountFrequency (%) 
0 15 7.5%
 
1 7 3.5%
 
2 5 2.5%
 
3 4 2.0%
 
4 6 3.0%
 
ValueCountFrequency (%) 
26 1 0.5%
 
23 1 0.5%
 
22 1 0.5%
 
21 4 2.0%
 
20 6 3.0%
 

SB
Real number (ℝ≥0)

ZEROS
Distinct count39
Unique (%)19.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.645
Minimum0.0
Maximum60.0
Zeros27
Zeros (%)13.5%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median6
Q314.25
95-th percentile30.1
Maximum60
Range60
Interquartile range (IQR)12.25

Descriptive statistics

Standard deviation10.378223
Coefficient of variation (CV)1.076021047
Kurtosis3.457095519
Mean9.645
Median Absolute Deviation (MAD)7.85335
Skewness1.712707743
Sum1929
Variance107.7075126
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 5.5 17.5 34.5 60. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 27 13.5%
 
5 16 8.0%
 
2 16 8.0%
 
4 15 7.5%
 
3 14 7.0%
 
10 12 6.0%
 
1 11 5.5%
 
8 7 3.5%
 
6 7 3.5%
 
16 7 3.5%
 
Other values (29) 68 34.0%
 
ValueCountFrequency (%) 
0 27 13.5%
 
1 11 5.5%
 
2 16 8.0%
 
3 14 7.0%
 
4 15 7.5%
 
ValueCountFrequency (%) 
60 1 0.5%
 
45 1 0.5%
 
43 1 0.5%
 
40 2 1.0%
 
37 1 0.5%
 

CS
Real number (ℝ≥0)

ZEROS
Distinct count15
Unique (%)7.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.925
Minimum0.0
Maximum16.0
Zeros40
Zeros (%)20.0%
Memory size1.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34.25
95-th percentile9
Maximum16
Range16
Interquartile range (IQR)3.25

Descriptive statistics

Standard deviation2.903710155
Coefficient of variation (CV)0.9927214205
Kurtosis2.737067564
Mean2.925
Median Absolute Deviation (MAD)2.223
Skewness1.499501005
Sum585
Variance8.431532663
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 6.5 10.5 16. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 41 20.5%
 
0 40 20.0%
 
1 35 17.5%
 
3 20 10.0%
 
5 18 9.0%
 
4 14 7.0%
 
6 10 5.0%
 
7 7 3.5%
 
9 4 2.0%
 
10 4 2.0%
 
Other values (5) 7 3.5%
 
ValueCountFrequency (%) 
0 40 20.0%
 
1 35 17.5%
 
2 41 20.5%
 
3 20 10.0%
 
4 14 7.0%
 
ValueCountFrequency (%) 
16 1 0.5%
 
14 1 0.5%
 
12 1 0.5%
 
11 1 0.5%
 
10 4 2.0%
 

AVG
Real number (ℝ≥0)

Distinct count93
Unique (%)46.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.28944499999999995
Minimum0.125
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size1.7 KiB

Quantile statistics

Minimum0.125
5-th percentile0.1995
Q10.26775
median0.2855
Q30.30425
95-th percentile0.32905
Maximum1
Range0.875
Interquartile range (IQR)0.0365

Descriptive statistics

Standard deviation0.08300540017
Coefficient of variation (CV)0.2867743446
Kurtosis53.31872608
Mean0.289445
Median Absolute Deviation (MAD)0.0338306
Skewness6.338494587
Sum57.889
Variance0.006889896457
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.125 0.24 0.2595 0.3195 0.333 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.273 6 3.0%
 
0.278 6 3.0%
 
0.295 6 3.0%
 
0.29 5 2.5%
 
0.297 5 2.5%
 
0.288 5 2.5%
 
0.281 4 2.0%
 
0.291 4 2.0%
 
0.284 4 2.0%
 
0.26 4 2.0%
 
Other values (83) 151 75.5%
 
ValueCountFrequency (%) 
0.125 1 0.5%
 
0.143 1 0.5%
 
0.157 1 0.5%
 
0.163 1 0.5%
 
0.175 2 1.0%
 
ValueCountFrequency (%) 
1 2 1.0%
 
0.5 2 1.0%
 
0.346 2 1.0%
 
0.335 1 0.5%
 
0.331 1 0.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexSeasonNameTeamAgeGABPAH1B2B3BHRRRBIBBIBBSOHBPSFSHGDPSBCSAVG
0632018.0Mookie BettsRed Sox25.0136.0520.0614.0180.096.047.05.032.0129.080.081.08.091.08.05.00.05.030.06.00.346
1672018.0Mike TroutAngels26.0140.0471.0608.0147.080.024.04.039.0101.079.0122.025.0124.010.04.00.05.024.02.00.312
2862019.0Mike TroutAngels27.0134.0470.0600.0137.063.027.02.045.0110.0104.0110.014.0120.016.04.00.05.011.02.00.291
31012019.0Alex BregmanAstros25.0156.0554.0690.0164.084.037.02.041.0122.0112.0119.02.083.09.08.00.09.05.01.00.296
4962017.0Aaron JudgeYankees25.0155.0542.0678.0154.075.024.03.052.0128.0114.0127.011.0208.05.04.00.015.09.04.00.284
51532018.0Jose RamirezIndians25.0157.0578.0698.0156.075.038.04.039.0110.0105.0106.015.080.08.06.00.02.034.06.00.270
6692019.0Christian YelichBrewers27.0130.0489.0580.0161.085.029.03.044.0100.097.080.016.0118.08.03.00.08.030.02.00.329
71092019.0Cody BellingerDodgers23.0156.0558.0660.0170.086.034.03.047.0121.0115.095.021.0108.03.04.00.010.015.05.00.305
81002018.0Christian YelichBrewers26.0147.0574.0651.0187.0110.034.07.036.0118.0110.068.02.0135.07.02.00.014.022.04.00.326
91232017.0Jose AltuveAstros27.0153.0590.0662.0204.0137.039.04.024.0112.081.058.03.084.09.04.01.019.032.06.00.346

Last rows

df_indexSeasonNameTeamAgeGABPAH1B2B3BHRRRBIBBIBBSOHBPSFSHGDPSBCSAVG
1905342018.0Eddie RosarioTwins26.0138.0559.0592.0161.0104.031.02.024.087.077.030.05.0104.00.02.01.04.08.02.00.288
1916382019.0Christian VazquezRed Sox28.0138.0482.0521.0133.083.026.01.023.066.072.033.03.0101.00.03.03.017.04.02.00.276
1925362018.0Matt OlsonAthletics24.0162.0580.0660.0143.081.033.00.029.085.084.070.03.0163.08.02.00.013.02.01.00.247
1935442018.0Mallex SmithRays25.0141.0480.0544.0142.0103.027.010.02.065.040.047.00.098.08.02.07.011.040.012.00.296
1947082017.0Kyle SeagerMariners29.0154.0578.0650.0144.083.033.01.027.072.088.058.06.0110.08.06.00.06.02.01.00.249
19512017.0Sean GilmartinMets27.02.01.01.01.00.01.00.00.01.01.00.00.00.00.00.00.00.00.00.01.000
1962462019.0Justin TurnerDodgers34.0135.0479.0549.0139.088.024.00.027.080.067.051.01.088.014.05.00.011.02.00.00.290
1973012017.0Aaron HicksYankees27.088.0301.0361.080.047.018.00.015.054.052.051.00.067.03.05.01.08.010.05.00.266
1983552017.0Josh ReddickAstros30.0134.0477.0540.0150.099.034.04.013.077.082.043.01.072.00.012.01.09.07.03.00.314
1991682019.0J.D. MartinezRed Sox31.0146.0575.0657.0175.0104.033.02.036.098.0105.072.09.0138.04.05.00.019.02.00.00.304